Session 5 Practice: Manipulating datasets

Note: There are often multiple ways to answer each question.

Install and load the MASS and dplyr packages. Load the nlschools dataset.
How can we find a description of the nlschools dataset? Why is the class column a factor and not a numeric variable? Use some of the functions we learned to get a feel for the data.
How many students are there in the dataset?
Create a new dataset which consists of students with verbal IQ >= 17.5.
Create a new dataset which consists of students whose SES score is < 37 and whose class ID is 2980.
How many students had a language test score of more than 50?
How many students were there in each class? Which class had the most number of students?
Create a new column named pass which takes on the value “pass” if lang >= 40, “fail” otherwise. Save the dataset with the new column in a variable nlschools2, then show the first 10 rows of the dataset. (Hint: The ifelse function will be handy.)
Your colleague hypothesizes that there is a strong relationship between IQ and social-economic stats (SES). Create a data frame which shows the mean IQ and language test scores of students for each SES value, and present the results in descending order of SES. Make plots of mean IQ and mean language score vs. SES.
Get a random sample of 10 rows from the dataset. (Hint: Look at the documentation for the sample_n function in the dplyr package.)